Speech/music discrimination based on posterior probability features
نویسندگان
چکیده
A hybrid connectionist-HMM speech recognizer uses a neural network acoustic classifier. This network estimates the posterior probability that the acoustic feature vectors at the current time step should be labelled as each of around 50 phone classes. We sought to exploit informal observations of the distinctions in this posterior domain between nonspeech audio and speech segments well-modeled by the network. We describe four statistics that successfully capture these differences, and which can be combined to make a reliable speech/nonspeech categorization that is closely related to the likely performance of the speech recognizer. We test these features on a database of speech/music examples, and our results match the previously-reported classification error, based on a variety of special-purpose features, of 1.4% for 2.5 second segments. We also show that recognizing segments ordered according to their resemblance to clean speech can result in an error rate close to the ideal minimum over all such subsetting strategies.
منابع مشابه
A Sphinx Based Speech-music Segmentation Front-end for Improving the Performance of an Automatic Speech Recognition System in Turkish
In this study a system that segments an audio signal as speech and music by using posterior probability based features is proposed and implemented in Sphinx. Unlike the earlier efforts that uses Multi-Layer Perceptrons (MLP), this system uses Hidden-MarkovModel based acoustic models that are trained in Sphinx for posterior probability calculations. Acoustic Models are trained with the HMM-state...
متن کاملExperiments on Speech/Music Discrimination
The problem of speech/music discrimination has become increasingly important as automatic speech recognition system are applied to more real-world multimedia domains. One of the issue in the design of a signal classifier is the selection of an appropriate feature set that captures the temporal and spectral structures of the signal. Many features have been used in speech/music discrimination. Th...
متن کاملSubmitted to Eurospeech’99, Budapest SPEECH/MUSIC DISCRIMINATION BASED ON POSTERIOR PROBABILITY FEATURES
A hybrid connectionist-HMM speech recognizer uses a neural network acoustic classifier. This network estimates the posterior probability that the acoustic feature vectors at the current time step should be labelled as each of around 50 phone classes. We sought to exploit informal observations of the distinctions in this posterior domain between nonspeech audio and speech segments well-modeled b...
متن کاملA wavelet-based parameterization for speech/music segmentation
The problem of speech/music discrimination is a challenging research problem which significantly impacts Automatic Speech Recognition (ASR) performance. This paper proposes new features for the Speech/Music discrimination task. We propose to use a decomposition of the audio signal based on wavelets, which allows a good analysis of non stationary signal like speech or music. We compute different...
متن کاملRobust singing detection in speech/music discriminator design
In this paper, an approach for robust signing signal detection in speech/music discrimination is proposed and applied to applications of audio indexing. Conventional approaches in speech/music discrimination can provide reasonable performance with regular music signals but often perform poorly with singing segments. This is due mainly to the fact that speech and singing signals are extremely cl...
متن کامل